Travel Package Purchase Prediction - Ensemble Techniques

Background and Context

As a Data Scientist for a tourism company named "Visit with us". The Policy Maker of the company wants to enable and establish a viable business model to expand the customer base.

A viable business model is a central concept that helps us to understand the existing ways of doing the business and how to change the ways for the benefit of the tourism sector.

One of the ways to expand the customer base is to introduce a new offering of packages.

Currently, there are 5 types of packages the company is offering - Basic, Standard, Deluxe, Super Deluxe, King. Looking at the data of the last year, we observed that 18% of the customers purchased the packages.

However, the marketing cost was quite high because customers were contacted at random without looking at the available information.

The company is now planning to launch a new product i.e. Wellness Tourism Package. Wellness Tourism is defined as Travel that allows the traveler to maintain, enhance or kick-start a healthy lifestyle, and support or increase one's sense of well-being.

However, this time company wants to harness the available data of existing and potential customers to make the marketing expenditure more efficient.

As a Data Scientist at "Visit with us" travel company have to analyze the customers' data and information to provide recommendations to the Policy Maker and Marketing Team and also build a model to predict the potential customer who is going to purchase the newly introduced travel package.

Objective

Data Dictionary

Customer details:

Customer interaction data:

Let's start coding!

Installing necessary library

Loading necessary libraries

Load the data for analysis

Shape of the data

Let's view a sample of the dataset.

Duplicate values in the data

Missing values (count) in the data.

Missing values (Percentage) in the data

Unique values (count) in the data.

Data types of the columns in the dataset.

Summary of the dataset

Observation

Data Preprocessing

CustomerID

CustomerID column is a sequencial number assigned to each customer, this doesn't add any value for analysis. Hence dropping of the column

Gender

As we have seen above Gender column has three unique values, lets analyze those

We can see that there is a typo in gender value, we need to correct this. Female and Fe male is both same.

PreferredPropertyStar

It appears Designation and Citytier play roles in PreferredPropertyStar, Lets fillup missing values using this.

TypeofContact

Self Enquiry is the maximum number times appeared, lets fillup the missing value with Self Enquiry

NumberOfFollowups

It seems 'ProductPitched' and 'MaritalStatus' determines somewhat number of followup, It is seen that Single or Unmarried Status has number of followup as 3 otherwise its 4. Lets use this to fillup missing value

NumberOfChildrenVisiting

Assuming these customers don't have any childer and hence it was missing, lets fillup with 0

NumberOfTrips

Marital Status (single) has on a average to visits, rest of them has 3 visits.

Age

Considering Designation changes with experience and with age experience increase. I believe age of a individual is somewhats relates to his disgnation, sometimes designation also depends upon organization type and gender. Considering this we will try to fillup the missing value for age.

MonthlyIncome

As we already know that MonthlyIncome is directly related to is position in the organization, how big the orgranization is and also gender of the individual and in which city they are working (metro city salary is higher, whereas small city salary will be slightly lesser)

Lets fillup the missing value using above groupby technique.

DurationOfPitch

5.14% of values is missing in this field, lets try to fillup missing values

As we can see above - DurationOfPitch slightly depends upon ProductPitched, NumberOfTrips and TypeofContact, using this lets try to fillup the missing values

Lets Checking whether there is any missing value still present or not

All the missing values is now replaced with derived values based on the columns data

Checking datatypes information

Fixing DataType

ProdTaken, TypeofContact, CityTier, Occupation, Gender, ProductPitched, MaritalStatus, Passport, Designation, OwnCar and PitchSatisfactionScore is categorical variable. Lets fix the datatype

EDA & Data Visualization

Univariate Analysis

Observation

Observation

Bivariate & Multivariate Analysis

Observation

EDA - When customer has taken the product

Customer profile (characteristics of a customer) of the different packages

Create count plot on the basis of different columns

Observation

Create AgeGroup for product analysis on the basis of Age

AgeGroup VS ProductPitched

Observation

Designation VS ProductPitched

Observation

AgeGroup VS Designation

Observation

AgeGroup VS Occupation

Observation

Occupation vs ProductPitched

Observation

Observation

Create MonthlyIncomeGroup for product analysis on the basis of MonthlyIncome

MonthlyIncomeGroup VS ProductPitched

Observation

MonthlyIncomeGroup VS AgeGroup

Observation

EDA Observation - Customer Profiles who took the product

Baisc

Deluxe

King

King

Standard

Standard

Relationship between different fields

Observation

Model Building

Outlier Treatment

Data Preparation

Creating training and test sets.

Spliting data into training and test set

The Stratify arguments maintain the original distribution of classes in the target variable while splitting the data into train and test sets.

Building the model

Model can make wrong predictions as:

Which case is more important?

How to reduce this loss i.e need to reduce False Negatives?

Utitlity methods

Model building - Bagging

Decision Tree Classifier

Visualizing Decision Tree

Observation

Random Forest Classifier

Observation

Bagging Classifier

Observation

Model performance improvement (Hyperparameter Tuning) - Bagging

Decision Tree

Visualizing Decision Tree

Observation

Random Forest

Observation

Bagging Classifier

Observation

Model building - Boosting

AdaBoost Classifier

Observation

Gradient Boosting Classifier

Observation

XGBoost Classifier

Observation

Model performance improvement (Hyperparameter Tuning) - Boosting

AdaBoostClassifier

Observation

Gradient Boosting Classifier

Observation

XGBoost Classifier

Observation

Stacking Classifier

Observation

Comparing all models

Observation

Actionable Insights & Recommendations

Insights

Recommendations